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The  application  of  pyramid-like  data  structures  to  multi¬ 
dimensional  queries  has  been  explored  in  three  recent  papers 
(BS-77,  Lu-73,  Wi-78).  It  will  be  shown  here  that  many  of  the 
earlier  results  (including  some  of  our  own}  can  be  improved  by 
a  factor  of  log  IT  with  a  slightly  modified  data  structure  that 
enables  k-dimensicnal  searches  to  be  performed  in  O(log"  17)  time. 
The  new  revised  pyramid  structure  can  be  made  sufficiently  efficient 
in  a  dynamic  environment  to  have  an  0(logk“^IT)  record-insertion  and 
deletion  runtime.  Queries  for  the  special  two-dimensional  version 
of  the  proposed  pyramid  will  have  the  same  combination  of  0 (log  II) 
retrieval,  insertion  and  deletion  runtimes  that  has  traditionally 
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coefficient  associated  with  its  memory  space  utilization  will  only 
be  approximately  5 larger  than  that  of  the  oterwice  considerably 
less  efficient  pyramids  of  BS-77 ,  Lu-78  and  Wi-78.  Also,  it  vri.ll 
be  shown  here  how  the  combination  of  the  concepts  of  this  paper 
along  -with  Ee-75,  Hi -7 6,  V/i-78  and  Wi-78a  can  be  used  to  develop 
very  useful  partial  match  data  structures. 


HEW  DATA  STRUCTURES  ECR  ORTHOGONAL  QUERIES 


By  Dan  E.  Willard 
Harvard  University 


There  has  long  been  an  apparent  need  for  an  efficient  data 
structure  which  supports  retrievals  on  a  conjunction  of  range 
predicates  similar  to 

a1<KEY.l<b1  &  a2<EEY.2<’b2  &  . . .  a^<  EEY.k  <  bk 

Following  Znuth's  suggestion  (Zn-73 ) ,  a  series  of  articles  has 
appeared  within  the  last  five  years  discussing  this  problem 
in  the  context  of  a  data  structure  which  occupies  C (IT)  space 
(EB-74,  3S-75,  Be-75,  LW-77,  Wi-78a).  I.Iore  recently,  several 
papers  have  begun  to  appear  which  discuss  the  improved  retrieval 

V  1 

time  resulting  from  an  allocation  of  0 (IT  log*  IT)  memory  space 
(3S-77,  Wi-78,  Iu-78).  This  article  will  show  how  a  subtle 
change  produces  a  dramatic  improvement  in  the  pyramid-like 
data  structure  of  the  latter  series  of  articles. 

In  our  discussion,  L  will  denote  the  initial  list  of  elenen 
the  subset  of  L  that  descends  from  tree-node  v,  and  ?0(k,L) 
the  k-dinensional  pyramid  structure  that  was  advocated  in  the 
previous  articles.  This  pyramid  will  be  inductively  defined 
according  to  the  value  of  k  as  follows: 


k. 
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1)  If  k=l ,  then,  the  corresponding  P  (k,L)  pyramid  v/ill  be 
defined  as  a  tree  representation  of  list  L  that  has 

an  C(log  IT)  height  and  has  sorted  ios  records  by 
increasing  ZSY.l  value. 

2)  Given  that  k-1  dimensional  pyramids  are  previously 
defined,  PQ(k,L)  vn.ll  he  inductively  defined  as  a 
tree  with  C(log  N)  height  that  has  the  records  of  L 
sorted  hy  increasing  EEY.k  value  and  which  additionally 
associates  each  interior  node  v  v/ith  an  auxiliary 
PQ(k-l,  ly)  pyramid .  This  auxiliary  pyramid  was 
called  an  SDS  field  in  T/i-78. 

For  a  query  q  of  the  canonical  form 
a1<Z3Y.l<h1  &  a2  <ZZY.2  <^b2  &  . . .  av  <  E3Y.  k  K  b,„ 

the  following  terminology  v/ill  he  used: 

i)  SET(q)  will  denote  the  subset  of  the 
initial  file  that  satisfies  q 

ii)  CCUTTT(q)  v/ill  denote  the  number  of 
records  belonging  to  SZT(q) 

iii)  given  a  previously  defined  function  ?, 

SUH(q)  will  denote  the  sum  of  the 
P-values  of  those  records  belonging 
to  SET ( q ) 

The  "locate-and-copy"  time  of  a  specified  retrieval  algcrit 
will  be  defined  as  the  amount  of  runtime  needed  by  the  procedure 
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to  find  and  transfer  the  members  of  SET(q)  into  the  user's  work 
space.  This  concept  is  not  very  useful,  because  the  degenerate 
case  where  COUIIT(q)  =  N  forces  all  procedures  to  have  an  O(IT) 
worst-case  locate-and-copy  tine.  Consequently,  another  notion 
will  be  necessary  in  our  worst-case  analysis,  and  this  paper 
will  rely  on  the  following  two  measurements : 

i)  the  "locate"  retrieval  tine  of  a  search  procedure 
will  be  defined  as  the  difference  obtained  when 
subtracting  CCTOTT(q)  from  the  locate-and-copy  tine 
(worst-case  analysis  of  locate  run tine  is  meaningful 
because  this  quantity  has  been  automatically  adjusted 
to  avoid  the  trivial  degeneration  that  results  when 
COUTTT ( q }  is  a  large  quantity) 
ii)  the  aggregate-scan  tine  of  a  retrieval  algorithm,  is 
defined  as  the  amount  of  time  needed  to  scan  the 
SET(q)  collection  of  records  for  the  purpose  of 
calculating  one  of  its  aggregate  values,  such  as 
SUL!(q)cr  CCUITT(a).  . 

The  application  of  the  above  two  concepts  to  PQ(k) 
pyramids  was  discussed  in  3S-77,  Lu-78  and  i7i-78.  Some  of 
the  results  obtained  in  these  papers  were  quite  similar,  since 
they  were  written  during  overlapping  time  periods.  7/hat  was 
known  about  pyramids  previous  to  this  article  is  given  below: 

1)  SUK(q)  and  CCU~iT(q)  can  be  calculated  in  Odog^IT) 
worst-case  aggregate-scan  time  (2S-77.  Lu-78,  '.Vi— 78 ) . 
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2)  S2T(q)  can  be  calculated  in  Ollog^lT)  worst-case  locate 
time  (observed  in  7/1-78  as  a-  straightforward  generalisation 
of  item  1) . 

3)  If  L  is  initially  the  empty  set,  and  if  a  sequence  of  IT 
insertion  and  deletion  commands  are  subsequently  given, 
then  the  total  time  needed  to  dynamically  adjust  ?0(k.L) 
in  response  to  this  command  sequence  will  have  a  worst- 
case  C(IT  log^TT)  magnitude.  (7irst  proposed  in  V/i-78. 
Several  months  later  an  independent  derivation  of  a  basical 
similar  procedure  was  presented  in  a  conference  as  lu- 7 8 .  ) 

4)  The  above  result  can  be  strengthened  to  indicate  the 
existence  of  a  procedure  that  executes  individual 
insertion  and  deletion  commands  in  C(log^IT)  worst-case 
time  (v/i-78;  also  in  Wi-78b). 

5)  Several  of  the  above  results  can  have  their  runtime 
reduced  by  a  factor  of  log  N  in  a  batch  environment 
where  IT  operations  are  simultaneously  performed. 

Such  batch  procedures  include: 

5a)  an  algorithm  that  constructs  an  entire  ?  (k,L) 
data  structure  in  IT  logA_1N  nine  (23-77.  lu-78) 

5b)  a  procedure  that  calculates  ECD?  statistics  in 
IT  logk_1N  time  (5S-77) 

5c)  given  n  queries  of  q^  •••  Qn>  a  procedure 

that  calculates  their  SUII(q)  and  CCUITT(q)  values 
in  N  logk"’^T  time  (77i-78) 
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The  discussion  in  this  paper  will  focus  on  topics  1  through 

4  rather  than  the  batch  algorithms  of  topic  5.  It  will  be  shown 

here  that  the  runtimes  associated  with  topics  1-4  can  almost  be 

reduced  by  a  factor  of  log  N,  thus  deriving  the  new  magnitude  of 

O(logk“  ’^IT).  '.Ye  say  "almost"  because  the  criterion  used  for 

measuring  runtime  here  is  slightly  weaker  than  that  in  ’.Vi-73 

and  the  previous  references.  The  distinction  is  that  the 

earlier  papers  discussed  worst-case  optimization  in  a  dynamic 

environment,  whereas  the  improved  results  of  this  paper  are 

either  expected  runtimes  in  a  dynamic  environment  or  worst-case 

runtimes  in  a  static  environment.  Our  new  algorithm  can  be 

controlled  to  ensure  that  its  worst-case  performance  will 

always  be  at  least  as  efficient  as  that  of  Y/i-78. 

The  symbols  F.(k),  PQ(k)  and  ?- (k)  will  denote  the  three 

modified  versions  of  the  ?Q(k)  pyramid  proposed  in  this  article. 

All  three  will  occupy  the  same  C ( IT  log-*” 'll)  quantity  of  memory 

space  previously  associated  with  ?0(k),  and  each  will  solve  a 

slightly  different  type  of  optimization  problem.  Below  are 

listed  the  three  main  results  that  will  be  proven  in  this  paper: 

Theorem  1:  The  ?s(k)  pyramid  (of  definitions  2  and  5) 

will  provide  a  static  environment  where  SDIl(q),  CCUHT(q) 

and  SET  (a)  can  be  evaluated  in  0(lcg^~  II)  worst-case  time. 

Theorem  2:  The  P  (k)  pyramid  (of  definitions  1  and  5) 

will  provide  a  partially  dynamic  environment  where  SZT(q) 

k-1 

can  be  located  in  C(log~  II;  v.-orst-case  time  and  where  records 
can  be  inserted  or  deleted  in  C(log*c“^H)  expected  time. 
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Theorem  3 :  The  ?d(k)  pyramid  (of  definitions  3  through  5) 
will  provide  a  fully  dynari  c  environment  where  record 
insertions,  record  deletions,  and  retrievals  of  3ET(q) 
can  he  executed  in  0(log^“^IT)  expected  tine  and  O(log^IT) 
worst-case  runtime.  (A  comparison  of  theorems  2  and  3 
indicates  that  ?^(k)  has  better  update  and  worse  retrieval 
time  than  P  (k).) 

w 

In  addition  to  discussing  the  above  three  classes  of  pyramids, 
this  paper  will  also  make  brief  mention  of  a  new  type  of  partial 
match  and  partial  region  query  data  structure  which  is  quite 
similar  to  these  pyramids.  This  new  data  structure  (discussed 
in  section  2)  will  enable  the  user  to  improve  retrieval  time  by 
allocating  C(TT  log  IT)  additional  units  of  space. 


i 

i 

l 


I' t 
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PART  1 

The  algorithms  in  this  paper  will  nahe  frequent  subroutine- 
calls  to  the  super-3-tree  procedure  introduced  in  ”/i-78  (soon  to 
be  widely  disseminated  in  Wi-78b).  3ecause  of  its  importance, 
the  next  several  paragraphs  will  summarize  the  nature  of  this 
super-B-tree  procedure . 

In  the  forthcoming  discussion  as  well  as  throughout  this 
paper,  it  will  be  assumed  that  our  trees  have  been  structured 
so  that  there  exists  a  one-to-one  correspondence  between  the 
leaves  of  the  tree  and  the  record  of  the  list  it  represents 
(as  opposed  to  a  pairing  between  general  nodes  and  records). 

An  SDS  field  will  be  defined  as  any  auxiliary  data  structure 
which  the  user  has  created  for  the  purpose  of  describing  the 
descendants  of  a  given  interior  node.  A  tree  (which  is  a 
representation  of  a  sorted  list  with  C(log  II)  height)  will 
be  called  an  augmented  tree  if  it  assigns  an  SDS  field  to 
each  of  its  interior  nodes.  For  instance,  the  ?0(k)  pyramids 
(whose  definitions  were  given  in  the  second  paragraph  of  this 
paper)  are  examples  of  an  augmented  tree. 

The  super-3-tree  theorem  describes  the  worst-case  amount 
of  runtime  needed  to  insert  and  delete  a  record  in  augmented 
trees,  in  terms  of  a  parameter  w  that  denotes  the  amount  of 
runtime  needed  to  insert  or  delete  a  single  record  in  an  SDS 
field.  The  theorem  states  that  arbitrary  insertion  and  deletion 
operations  can  be  performed  within  0(w  log  IT)  worst-case  runtime. 
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This  result  is  significant  because  the  super-B-tree  procedure 
simultaneously  ensures  that  the  augmented  tree  will  have  C(log  N) 
worst-case  height,  and  that  no  insertion  or  deletion  command  can 
cause  the  runtime  involved  in  adjusting  SDS  fields  to  emceed 
the  0(w  log  N)  worst-case  upper  bound.  (A  traditional  3-tree 
algorithm  (AVL-62,  HR-73,  AHU-74)  will  not  satisfy  this  condition 
because  O(wJT)  worst-case  time  will  be  spent  adjusting  the  SDS 
fields  when  "rebalancing"  is  performed.) 

This  paper's  discussion  of  pyramids  will  begin  with  the 
Pe(2,L)  because  it  is  the  simplest  of  cur  various  pyramids. 

The  definition  of  ?_(2,L)  is  given  below: 

Definition  1:  A  ?^(2,L)  pyramid  will  be  defined  as  a 

c 

two-part  data  structure  consisting  of  a  dictionary  D  and 
an  augmented  tree  T.  The  former  will  be  defined  as  a  B-tree 
which  has  its  records  sorted  by  EZY.l  and  which  possesses 
pointers  that  map  each  record  of  the  dictionary  onto  the 
location  where  the  record  is  stored  in  the  SDS  field  of 
the  root  of  T.  Here  T  vdll  be  defined  as  a  tree  which  has 
its  records  sorted  by  KEY. 2  and  which  uses  the  following 
rules  to  define  the  SDS  fields  of  each  of  its  nodes  v: 

A)  SDS(v)  vdll  be  a  doubly-chained  sorted  list  which  has 
taken  v's  descendants  (in  T)  and  arranged  them  by 
oraer  of  increasing  IZ2Y.1  value. 

B)  In  addition  to  containing  its  name,  the  entry  for 
record  R  in  SDS(v)  vdll  contain  the  following  information 
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i)  pointers  to  the  predecessor  and  successor  of  P. 
in  this  SDS  field 

ii)  a  "LEFT .rOWIT.POinilR"  that  contains  the  address 
of  the  least  record  in  the  SDS  field  of  v's  left 
son  whose  EEY.l  value  is  greater  than  or  equal  to 
that  of  R 

iii)  a  "HI  GET .  DO  WIT .  F  CUTTER  "  that  similarly  contains  the 
address  cf  -he  least  record  in  the  SDS  field  of 
v's  right  son  whose  EEY.l  value  is  grea-er  than 
or  equal  to  that  of  R 

Our  first  lemma  will  discuss  retrieval  operations  in  ?  (2,1) 
pyramids.  In  that  discussion,  as  well  as  elsewhere  in  this  paper, 
it  will  he  necessary  to  speak  of  the  nodes  which  are  "critical" 
with  respect  to  a  range  predicate  such  as  a^EEYOo.  An  interior 
node  v  of  a  specified  tree  will  he  defined  as  critical  whenever 
the  following  two  conditions  hold: 

i)  all  leafrecords  that  descend  from  v  satisfy  this  range 
condition; 

ii)  the  sane  is  not  true  for  v’s  father  (in  other  words,  one 
of  the  father's  descendants  does  not  satisfy  the  range 
condition) . 

Lemma  1:  Let  q  denote  a  two-dimensional  query  cf  the  form 

a^^EEY.l ^h^  &  a2  ^ICEY.2  ^hj,.  In  the  context  of  the  ?e(2.I) 

pyramid: 


k ...  - 
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A)  search  operations  for  SET(q)  can  he  executed  in  C(log  IT) 
worst-case  locate  tire 

B)  insertions  and  deletions  can  he  executed  in  0(log  IT ) 
expected  runt  ire 

Proof:  Only  proposition  A  requires  verification,  since  3 

is  rendered  trivial  hy  the  fact  that  it  discusses  only  expected 
runtime.  In  our  proof  of  A,  the  synhol  IIT?(a^,v)  will  denote  the 
least  record  in  SDS(v)  whose  KEY.  1  -i  .  The  search  procedure 
that  A  needs  to  locate  SET(q)  will  consist  of  the  following 
three  steps: 

1)  Find  the  address  of  IITF(a1,  root  of  T)  in  log  21  tire 
(hy  using  dictionary  D). 

2)  Let  IliFCa^,  critical)  denote  the  union  of  the  III?(a-,  ,v) 
elements  of  those  nodes  v  in  T  that  are  critical  wi th 
respect  to  <£2Y.  2  ^hg.  This  step  will  construct  the 
IJIFCa^,  critical)  set  in  C(log  IT)  tire  by  using  the 
binary  tree  that  is  rooted  at  IIT?(a^,  root  of  T)  and 
generated  hy  the  LEFT .30V/II  and  RIGHT  .ECY/1T  pointers. 

3)  Construct  the  sought-after  SET  (a.)  in  CCUUT(c)  runtime 
by  making  the  obvious  walk  down  the  list  of  "successor" 
pointers  that  is  generated  by  liTFCa^,  critical). 

The  above  algorithm  obviously  performs  locate-ana-copy 
operations  for  SET(q)  in  0(log  IT  +  CCUITT  (q ) )  tire.  Subtracting 
COUITT(c)  from  this  quantity,  we  obtain  the  result  that  SET(q) 
has  an  C(log  IT)  locate  runtime.  QZ3 
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The  next  objective  of  this  paper  will  be  to  design  and  study 

a  new  pyramid  that  is  capable  of  efficiently  calculating  SUIl(q) 

and  COURT (q)  values.  This  pyramid  will  be  called  ?_(2,1}  and 

s 

is  defined  below: 


Definition  2:  The  PS(2,L)  pyramid  will  be  defined  as  containing 

all  the  information  of  ?  (2,1)  plus  two  additional  fields  for  each 

v 

record  R  stored  in  SDS(v).  These  fields  will  be  denoted  as  SUI.I(r) 
and  COUITT(R).  In  the  context  of  v’s  SDS  field,  these  fields  Yri.ll 
specify  the  respective  SDH  and  COURT  of  the  subset  of  SDS(v)  whose 
KEY.  1  value  is  greater  than  or  equal  to  the  2EY.1  value  of  R. 

Lemma  2:  In  addition  to  satisfying  part  A  of  Lemma  1,  the 

Pg(2,l)  pyramids  will  enable  SUH(q}  and  CODITT(q)  to  be  calculated 
in  O(log  II)  v/orst-case  aggregate-scan  time. 


Proof 

1  • 

• 

Using  reasoning 

similar  to  Lemma  1, 

it 

CE 

.n  be  verified 

that 

all 

the  members 

of  the 

I-Tr  ( a^ ,  critical) 

and 

•  ' 

7F ( *c .  critical) 

sets 

can 

be  found  in 

log  IT 

time.  ’The  present 

lenm 

*  n 

follows  from 

this  observation  and  the  fact  that 


SUH(q)  = 


COURT (q)= 


SUM(x)  - 


z 


su: 


x£  IllPCa^,  critical; 


y£IHF(b^,  critical) 


COURT(x)  - 


z 


x  £  I2T?(a1  .critical) 


y  £ IiT?(b-^ , critical ) 


:CUj7_' 


CZD 


•  I  I  li  "nisiaaH 
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The  next  goal  of  this  section  will  be  to  design  a  pyramid 
that  optimises  on  worst-case  insertion  and  deletion  tine  in 
addition  to  the  expected  tine  optimization  mentioned  in  Lenna  12. 
The  proposed  pyramid  will  be  called  P^(2,I).  Our  discussion 
commences  with  the  following  preliminary  definition: 

Definition  3:  Let  s  denote  an  interior  node  of  an  augmented 

tree  that  is  contained  in  a  ?  (2,L)  pyramid,  y  a  record  in  SDS(s), 
x  the  predecessor  of  y  in  this  SDS  field,  and  v  the  father  of  s. 
The  symbol  ASSOC(s,y)  vri.il  denote  the  subset  of  SDS(v)  whose 
records  K  satisfy  the  inequality  ESY.l(x)  ^EEY.l(R)  ^EEY.My '< . 

Definition  4:  A  ?d  ( 2 , 1 )  pyramid  vri.ll  be  defined  as  having  a 

data  structure  identical  to  ?_(2,L)  in  all  respects  but  one. 

The  distinction  is  that  the  2^(2, L)  pyramid  will  not  have  any 
LEFT «DC TiTT .POI27T2R  or  RIGHT  .DOY.U.POIITTER  fields.  Instead,  each 
member  of  an  SDS  field  of  ?^(2,L)  will  contain  two  new  fields 
called  LETT  ,D  07/17.  LEAF  and  RIGHT  .DC', VII . LEA? ,  such  that 

i)  each  record  y  belonging  to  the  SDS  field  of  the  left 
son  of  v  will  be  associated  with  a  2-3  tree  whose 
leaves  are  the  LEFT  .DC77N. LEAVES  of  the  ASSCC  jjlaft  sen  at  v ) 
set  and  whose  root  points  to  y; 
ii)  each  record  y  belonging  to’  the  SDS  field  of  the  right 
sen  of  v  will  be  associated  with  a  similar  2-3  tree 
whose  leaves  are  the  RIGHT  .D  07/IT.  LEAVES  of 
ASSCC  jj  right  son  of  v),  y^j  • 
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The  above  2-3  trees  of  the  P^(2,L)  pyramid  will  henceforth 
be  called  capping  trees.  The  runtime  characteristics  of  this 
pyranid  will  be  discussed  in  the  next  lecca.  The  proof  of  that 
lemca  assumes  that  the  reader  is  familiar  with  the  characteristics 
of  2-3  trees  that  were  discussed  in  AHU-74. 


Lemma  3 ‘  Each  of  the  following  operations  can  be  performed 
in  C(log  IT)  expected  and  O(log  IT)  worst-case  time  with  the  use 
of  a  Pd(2,L)  pyramid: 

A)  searches  for  SET(a); 

B)  insertions  ana  deletions 


Proof  of  A: 


The  algorithm:  for  performing  searches  in  ?. (2,1) 

CL 


pyramids  is  identical  to  that  of  ?  (2,1),  except  that  the  former 
will  traverse  a  path  from  a  2CV.TT.LZA?  to  the  root  of  the  associated 
mapping  tree  on  those  occasions  when  the  latter  would  simply 
advance  to  the  position  indicated  by  the  corresponding  LEPT  or 
RIGHT, DCT7IT . POIITTZP. .  This  difference  cannot  increase  the  runtime 
of  the  P^(2,L)  procedure  by  a  factor  of  mere  than  log  IT  (since 
2-3  trees  have  log  IT  worst-case  heights).  Furthermore,  expected 
retrieval  time  should  not  increase  at  all,  since  the  mapping 
trees  in  the  present  application  will  have  an  C(l)  expected  height. 
Thus  the  previous  Lemma  1  implies  that  ?d(2,L)  will  have  C(log  IT) 
expected  and  C( log'll)  worst-case  retrieval  times. 

QED 
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Proof  of  3:  It  is  sufficient  to  confirm  the  proposition  only 

for  the  deletion  algorithm,  since  the  insertion  procedure  is 
similar.  Upon  the  user's  command  tc  delete  a  record  R,  the 
following  three-step  procedure  will  be  executed. 

1)  Utilize  dictionary  3  (of  Definition  1)  and  the  mapping 
trees  to  perform  a  straightforward  search  that  finds 
all  the  entries  for  record  R  in  the  SDS  fields  of  the 
Pd(2,L)  pyramid. 

2)  Repeatedly  execute  the  following  three  substeps  in  order 
to  renege  R  from  each  of  the  above  313  fields: 

a)  Delete  the  LEFT .DO'.VTT  and  RIGHT, D07E7  leaves  of  ?. 
from  their  mapping  trees: 

\)  lie rge  the  old  mapping  tree  whose  roots  pointed  ~o  ~ 
into  the  mapping  tree  whose  roots  point  to  R's 
immediate  predecessor  (in  the  relevant  SDS  field); 

c)  Deallocate  R's  memory  space  in  the  313  field  and 
make  the  predecessor  and  successor  fields  of  its 
predecessor  and  successor  point  to  each  ocher. 

3)  Remove  record  R  from  ?d(2,L)'s  augmented  tree  and  use 
the  super-3-tree  algorithm  to  rebalance  the  augmented 
tree  (so  that  it  retains  its  C(log  IT)  height). 

The  C(log  .1)  worst-case  runtime  of  steps  1  and  2  can  be 
understood  given  the  observation  that  the  time-consnming  parts 
of  these  steps  consisted  of  0(iog  IT)  invocations  of  certain 
specific  2-3  tree  manipulation  algorithms  for  which  AKU-74 
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has  verified  an  O(log  IT)  worst-case  runtime.  The  super-3-tree 
theorem  indicates  that  the  amount  of  time  needed  hr  step  3's 
rebalancing  procedure  must  have  the  same  magnitude  as  the  STS 
field  updating  which  takes  place  in  step  2.  Thus  the  combined 
runtime  of  all  three  3teps  of  our  deletion  algorithm  has  the 
Odog^N)  worst-case  magnitude  which  Lemma  3  attributed  to  it. 

Similar  reasoning  can  be  used  to  confirm  the  0(log  IT) 
expected  runtime  of  deletion  operations.  In  essence,  this 
runtime  follows  from  the  C(l)  expected  heights  of  the  mapping 
trees .  QED 


The  final  goal  of  this  section  will  be  to  generalize  lemmas 
1  through  3  for  h-dimensional  pyramids.  Telcw  is  cur  definition 
of  k-dimensi onal  pyramids: 

Definition  5:  let  Fj  denote  one  of  the  symbols  of  ?_ .  ?_  or 

and  let  I*y  denote  the  subset  of  list  L  that  is  a  descendant 
of  v.  The  symbol  ?^(k,L)  will  denote  a  typical  k-din.ensicr.al 
pyramid  representation  of  L.  If  k^;3*  then  the  associated 
P^(k,L)  pyramid  vri.ll  be  inductively  defined  as  an  augmented 
tree  sorted  by  ZSY.k  whose  STS  field  equals  Pi(I:-l,  1^). 

Claim  1:  The  portions  of  Theorems  1  through  3  that  discuss 

retrieval  times  of  k-dimensional  pyramids  are  valid.  (The 
statement  of  these  theorems  can  be  found  in  the  introductory 
portion  of  this  paper. ) 
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Proof:  For  k  — 3 ,  consider  a  retrieval  procedure  that  locates 

the  nodes  which  are  critical  with  respect  to  a,,  <  Z2Y.I:  ^b,_  and 
that  recursively  calls  itself  to  search  the  SDS  fields  cf  these 
nodes.  It  is  trivial  to  verify  that  such  a  procedure  7/ill  cause 
P^(k)  pyramids  to  have  a  retrieval  tine  that  exceeds  ?i(k-l)  'ey 
a  factor  of  log  IT.  This  fact,  conbined  v/ith  Lenras  1  through  3, 
easily  inductively  verifies  the  clain. 

OSD 

Claim  2:  The  portions  of  Theorems  1  through  3  that  discuss 

insertion  and  deletion  runtime  are  also  valid. 

Proof:  The  super-F-tree  theorem  implies  that  the  update  time 

of  a  P^k)  pyramid  will  exceed  that  of  ?s  (k-1 )  by  a  factor  of 
log  IT.  The  claim  follows  from  the  conjunction  of  this  fact, 
the  principle  of  induction,  and  Lemmas  1  through  3. 

QED 

Although  the  discussion  in  this  section  was  centered  on 
measurements  of  CPU  runtime,  the  proposed  data  structures  are 
also  useful  in  minimizing  disc  accesses.  To  illustrate  this 
point,  v/e  consider  the  F  (2)  pyramid. 

V 

In  a  paging  environment .  the  SI'S  fields  of  P  (2)  pyramids 
should  be  arranged  so  that  consecutive  records  in  their  sorted 
lists  appear  on  the  seme  page.  Let  k  denote  the  average  number 
of  records  stored  on  a  typical  page,  and  r  the  fraction  of  the 


Sfillard  -  16a 


file's  total  records  that  satisfy  c^I3Y.2^d.  A  full  locate- 
and-copy  operation  to  retrieve  the  records  satisfying 
a<H2Y.l^‘b  AED  c<EZY.2^d  fron  a  ?  (2)  pyranid  will  require 

C 

c  onPm  f  ) 

C-,  log  IT  +  ^  v worst-case  page  accesses  (for  sene  snail 

constant  C^).  In  contrast,  the  sane  query  would  require 

C2  log  N  +  expected  page  accesses  v, 1th  a  (2EY.1- sorted)  3-ire 

or  sone  other  conventional  nethod  of  organizing  a  file.  As  r 
is  always  less  than  one  and  usually  very  snail,  the  ?  (2)  pyramids 
produce  a  clear  gain  in  efficiency. 


16b 


Note  Added  February  15 

At  the  time  when  the  November  draft  of  this  paper  was  com¬ 
pleted,  I  was  aware  of  the  possibility  of  slightly  modifying  the 
structure  of  the  Pd  pyramids  by  giving  them  mapping  trees  with  a 
"multiway"  rather  than  2-3  structure.  Multiway  trees  have  been 
described  in  Kn-73,  and  they  are  the  generalization  of  2-3  trees 
that  assign  each  interior  mode  between  2M  and  2 M-1  sons  (for  some 
fixed  M).  The  employment  of  multiway  mapping  trees  in  the  con¬ 
text  of  Pd  pyramids  would  produce  an  improvement  in  the  coefficient 
associated  with  retrieval  time  at  the  expense  of  the  update  runtime 
coefficient.  Such  a  modification  of  the  runtime  coefficient  was 
not  mentioned  in  my  entire  draft  because  I  did  not  consider  it 
expecially  subtle. 

I  now  realize  that  multiway  mapping  trees  are  more  important 

than  I  previously  expected  because  they  can  be  used  to  define  new 

magnitudes  cf  runtime.  This  can  be  done  if  M  is  treated  as  a 

variable  rather  than  a  constant.  For  instance,  if  fl  is  defined 

M 

as  the  least  integer  such  that  M  >n  then  the  multiway  mapping 

trees  will  produce  a  log  log  N  improvement  in  retrieval  time  at 

2 

the  expense  of  an  log  N/(log  log  N)  worsenning  of  update  runtime. 

k  k  + 1  o 

Hence,  a  log  N/  log  log  N  retrieval  and  log  N/(log  log  Mr  worst- 

case  update  can  be  associated  with  K  dimensional  pyramids.  This 

change  in  worst-case  runtime  is  produced  without  alterning  the 

k- 1 

basic  log  N  expected  runtime  that  is  associated  with  Pd(k) 
pyramids.  It  appears  that  many  users  may  desire  to  employ  this 
technique  since  a  high  priority  is  usually  assigned  to  optimizing 

retrieval  runtime.  Further  improvements  in  the  magnitude  of 
retrieval  runtime  do  not  appear  possible  without  seriously 
damaging  the  worst-case  update  runtime  associated  with  Fd(k) 
pyramids . 
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iii)  satisfies  the  Eievergelt-Reingold  23(c*)  condition  (tie 
nature  of  which  can  be  explained  if  the  ratio  of  v's 
left  son's  descendants  ever  v's  descendants  is  denoted 
as  p(v):  here  33(0*0  requires  that  all  nodes  of  the 
tree  satisfy  cK-4.  p(v)  4  1  -  c£ ) 

Let  denote  the  cardinality  of  the  subset  of  our  initial 
file  that  satisfies  a^EEY.O^b.  The  theorems  of  3e-75,  Hi-76 
and  V/i-7Sa  can  be  easily  generalized  to  show  that  A(ZSY.Q,  h,  cf-  ) 

1  _  -i  /V 

associates  an  0(I.Ia^)'“  '■/  )  worst-case  retrieval  tine  with  queries 
of  the  form: 


a<Z2Y.0<b  £ 


2Y.i1  =  C1  & 


Y.i*  =  C-  &  ...  KEY. i  .  =  C 

a  i  0 


In  contrast,  the  sane  query  in  traditional  partial  natch  files 
would  require  0(7  d-  " }  worst-case  runtine  (where  IT  denotes  the 
file's  cardinality}.  Thus  the  A(HEY.O,  k,  ch  )  data  structure 
will  have  a  significantly  improved  retrieval  tine,  produced 
through  an  allocation  of  T7  log  IT  additional  units  of  memory 
soace. 


The  point  of  this  example  is  that  the  super-3-tree  algcrizh 
has  many  significant  applications  beyond  the  pyrarids  of  the  las 
section.  In  the  present  context,  subroutines  -calls  ic  the 
super-3-tree  algorithm  will  guarantee  that  any  record  can  be 
inserted  into  and  deleted  from  A(KEY.O  ,  k,  )  in  O(log  IT)  tine 
if  Rivest-like  hash  systems  are  used  to  define  the  SSS  fields, 
and  in  O(log^N)  worst-case  runtime  if  the  otherwise  mere  flexibl 
k-d  trees  of  3e-75  and  ’.Vi -7  8a  are  employed.  Several  other  use  Yu 
applications  cf  the  super-2- tree  procedure  are  discussed  in  T.i-7 


i]  -I’  <1>  rj  «> 
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CONCLUSION 

Our  goal  in  this  paper  was  to  improve  the  runtime  of  multi¬ 
dimensional  systems  by  employing  data  structures  which  require 
more  than  C(N)  space.  It  was  shown  here  that  this  could  be 
done  with  data  structures  that  occupy  as  little  as  C(N  log  IT) 
or  0(N  log^N)  space.  This  result  could  be  quite  significant 
if  the  cost  of  computer  memory  continues  to  drop  at  the  same 
rate  as  it  has  in  the  past. 
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